The goal is to forecast the requested_qty amounts for the month of June and July in 2020. Currently the sell_in_qty and requested_qty for these two months are null.
In order to predict the requested quantity amount and sell in quantity amount, I am going to use Long Short-term Memory (LSTM) method which is a popular tool in Deep Learning. We are going to use Keras in order to implement LSTM.
First, we will start by importing required libraries.
# Importing dependencies
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from pathlib import Path
%matplotlib inline
Here, I am importing the given data from CSV file.
product_csv_path = Path("one_bu_test_train_dataset.csv")
# Loading the datasets
product = pd.read_csv(product_csv_path, index_col="monthly_date", infer_datetime_format=True, parse_dates=True)
product.head()
| product_id | Business Unit | Business Group | Major Product Group | Product Group | lineage | month | avg_map_discount | discount_avg | price | days_discount | days_map_discount | sell_in_qty | requested_qty | discount_max | max_map_discount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| monthly_date | ||||||||||||||||
| 2020-04-01 | 534 | 2 | 9052 | E43 | 3199 | 7131 | 4 | 0.0 | 0.0 | 66.86793 | 0.0 | 0.0 | 417.000000 | 507.000000 | 0.0 | 0.0 |
| 2019-12-01 | 534 | 2 | 9052 | E43 | 3199 | 7131 | 12 | 0.0 | 0.0 | 66.86793 | 0.0 | 0.0 | 87.000000 | 87.000000 | 0.0 | 0.0 |
| 2020-02-01 | 534 | 2 | 9052 | E43 | 3199 | 7131 | 2 | 0.0 | 0.0 | 66.86793 | 0.0 | 0.0 | 12.428571 | 27.857143 | 0.0 | 0.0 |
| 2020-03-01 | 534 | 2 | 9052 | E43 | 3199 | 7131 | 3 | 0.0 | 0.0 | 66.86793 | 0.0 | 0.0 | 78.000000 | 102.000000 | 0.0 | 0.0 |
| 2019-11-01 | 534 | 2 | 9052 | E43 | 3199 | 7131 | 11 | 0.0 | 0.0 | 66.86793 | 0.0 | 0.0 | 19.714286 | 23.142857 | 0.0 | 0.0 |
This is how our data looks like. From the above dataframe it is evident that our data is not sorted based on monthly date. As a time series forecasting prolem it is first rule of the forecasting that the data should be sorted in ascending order based on monthly date. So we sort our data by monthly date.
# Sorting the dataset
product = product.sort_values(by="monthly_date")
After sorting this is how our dataframe looks like.
product
| product_id | Business Unit | Business Group | Major Product Group | Product Group | lineage | month | avg_map_discount | discount_avg | price | days_discount | days_map_discount | sell_in_qty | requested_qty | discount_max | max_map_discount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| monthly_date | ||||||||||||||||
| 2017-12-01 | 561 | 2 | 9052 | O35 | 7745 | 7211 | 12 | 0.000000 | 0.0 | 2.060415 | 0.0 | 0.000000 | 32.571429 | 32.571429 | 0.0 | 0.000000 |
| 2017-12-01 | 565 | 2 | 9052 | O35 | 7745 | 7217 | 12 | 0.000000 | 0.0 | 1.443059 | 0.0 | 0.000000 | 61.714286 | 61.714286 | 0.0 | 0.000000 |
| 2017-12-01 | 510 | 2 | 9052 | O35 | 7746 | 7027 | 12 | 0.000000 | 0.0 | 16.206844 | 0.0 | 0.000000 | 43.714286 | 80.000000 | 0.0 | 0.000000 |
| 2017-12-01 | 679 | 2 | 9052 | E41 | 3183 | 7452 | 12 | 0.150075 | 0.0 | 72.614650 | 0.0 | 0.580645 | 4.853439 | 0.285714 | 0.0 | 0.150075 |
| 2017-12-01 | 820 | 2 | 9052 | O36 | 3190 | 7845 | 12 | 0.000000 | 0.0 | 2.817612 | 0.0 | 0.000000 | 28.285714 | 28.285714 | 0.0 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2020-07-01 | 683 | 2 | 9052 | E42 | 3192 | 7462 | 7 | 0.000000 | 0.0 | 1.846408 | 0.0 | 0.000000 | NaN | NaN | 0.0 | 0.000000 |
| 2020-07-01 | 540 | 2 | 9052 | O35 | 3185 | 7147 | 7 | 0.000000 | 0.0 | 8.677368 | 0.0 | 0.000000 | NaN | NaN | 0.0 | 0.000000 |
| 2020-07-01 | 674 | 2 | 9052 | E41 | 3182 | 7436 | 7 | 0.000000 | 0.0 | 14.070611 | 0.0 | 0.000000 | NaN | NaN | 0.0 | 0.000000 |
| 2020-07-01 | 821 | 2 | 9052 | O36 | 3190 | 7847 | 7 | 0.000000 | 0.0 | 2.788491 | 0.0 | 0.000000 | NaN | NaN | 0.0 | 0.000000 |
| 2020-07-01 | 620 | 2 | 9052 | E42 | 3192 | 7308 | 7 | 0.000000 | 0.0 | 1.740552 | 0.0 | 0.000000 | NaN | NaN | 0.0 | 0.000000 |
6671 rows × 16 columns
Our dataframe has 6500+ observations of product data over the year of approximately 3 years (2017-2020).
Now let us look into the information about our dataframe. For this we use .info() function which prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage.
product.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6671 entries, 2017-12-01 to 2020-07-01 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 product_id 6671 non-null int64 1 Business Unit 6671 non-null int64 2 Business Group 6671 non-null int64 3 Major Product Group 6671 non-null object 4 Product Group 6671 non-null int64 5 lineage 6671 non-null int64 6 month 6671 non-null int64 7 avg_map_discount 6671 non-null float64 8 discount_avg 6671 non-null float64 9 price 6671 non-null float64 10 days_discount 6671 non-null float64 11 days_map_discount 6671 non-null float64 12 sell_in_qty 6288 non-null float64 13 requested_qty 6288 non-null float64 14 discount_max 6671 non-null float64 15 max_map_discount 6671 non-null float64 dtypes: float64(9), int64(6), object(1) memory usage: 886.0+ KB
Here, I am checking the shape of the dataframe.
product.shape
(6671, 16)
product.tail()
| product_id | Business Unit | Business Group | Major Product Group | Product Group | lineage | month | avg_map_discount | discount_avg | price | days_discount | days_map_discount | sell_in_qty | requested_qty | discount_max | max_map_discount | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| monthly_date | ||||||||||||||||
| 2020-07-01 | 683 | 2 | 9052 | E42 | 3192 | 7462 | 7 | 0.0 | 0.0 | 1.846408 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 |
| 2020-07-01 | 540 | 2 | 9052 | O35 | 3185 | 7147 | 7 | 0.0 | 0.0 | 8.677368 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 |
| 2020-07-01 | 674 | 2 | 9052 | E41 | 3182 | 7436 | 7 | 0.0 | 0.0 | 14.070611 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 |
| 2020-07-01 | 821 | 2 | 9052 | O36 | 3190 | 7847 | 7 | 0.0 | 0.0 | 2.788491 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 |
| 2020-07-01 | 620 | 2 | 9052 | E42 | 3192 | 7308 | 7 | 0.0 | 0.0 | 1.740552 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 |
# Correlation
plt.figure(figsize = (20, 10))
sns.heatmap(product.corr(), annot = True);
There is no strong correlation visible between any columns.
Our goal here is to forecast monthly sales in quantity and monthly requested quantity. So we need to aggregate our data at the monthly level and sum up the sale in quantity and requested quantity of the products.
Let us sum by the sale in quantity first.
#groupby date and sum the sales
product_sales = product.groupby('monthly_date').sell_in_qty.sum().reset_index()
product_sales
| monthly_date | sell_in_qty | |
|---|---|---|
| 0 | 2017-12-01 | 6455.159707 |
| 1 | 2018-01-01 | 305853.773321 |
| 2 | 2018-02-01 | 355411.037870 |
| 3 | 2018-03-01 | 289141.505604 |
| 4 | 2018-04-01 | 276667.021410 |
| 5 | 2018-05-01 | 317928.140667 |
| 6 | 2018-06-01 | 282536.496024 |
| 7 | 2018-07-01 | 270871.260218 |
| 8 | 2018-08-01 | 323579.714510 |
| 9 | 2018-09-01 | 316631.021185 |
| 10 | 2018-10-01 | 157359.243829 |
| 11 | 2018-11-01 | 295758.400315 |
| 12 | 2018-12-01 | 348175.282931 |
| 13 | 2019-01-01 | 323856.485921 |
| 14 | 2019-02-01 | 254740.442035 |
| 15 | 2019-03-01 | 456095.371668 |
| 16 | 2019-04-01 | 521940.245998 |
| 17 | 2019-05-01 | 428531.669332 |
| 18 | 2019-06-01 | 434096.404087 |
| 19 | 2019-07-01 | 698939.268621 |
| 20 | 2019-08-01 | 651601.935819 |
| 21 | 2019-09-01 | 442963.233835 |
| 22 | 2019-10-01 | 350025.409798 |
| 23 | 2019-11-01 | 324671.612968 |
| 24 | 2019-12-01 | 456584.470433 |
| 25 | 2020-01-01 | 307274.262608 |
| 26 | 2020-02-01 | 568144.374972 |
| 27 | 2020-03-01 | 597742.923509 |
| 28 | 2020-04-01 | 670535.527965 |
| 29 | 2020-05-01 | 465339.577604 |
| 30 | 2020-06-01 | 0.000000 |
| 31 | 2020-07-01 | 0.000000 |
Now let us sum up by the requested quantity.
#groupby date and sum the requested quantity
product_requested = product.groupby('monthly_date').requested_qty.sum().reset_index()
product_requested
| monthly_date | requested_qty | |
|---|---|---|
| 0 | 2017-12-01 | 6740.428571 |
| 1 | 2018-01-01 | 331772.000002 |
| 2 | 2018-02-01 | 444620.714280 |
| 3 | 2018-03-01 | 339028.428569 |
| 4 | 2018-04-01 | 308394.857150 |
| 5 | 2018-05-01 | 353002.285709 |
| 6 | 2018-06-01 | 303604.285704 |
| 7 | 2018-07-01 | 287819.000000 |
| 8 | 2018-08-01 | 340184.857142 |
| 9 | 2018-09-01 | 349224.857135 |
| 10 | 2018-10-01 | 167510.428577 |
| 11 | 2018-11-01 | 305921.857150 |
| 12 | 2018-12-01 | 361389.857150 |
| 13 | 2019-01-01 | 337229.000003 |
| 14 | 2019-02-01 | 251940.857152 |
| 15 | 2019-03-01 | 533232.571421 |
| 16 | 2019-04-01 | 623762.285726 |
| 17 | 2019-05-01 | 484507.285697 |
| 18 | 2019-06-01 | 503502.142879 |
| 19 | 2019-07-01 | 724598.428563 |
| 20 | 2019-08-01 | 614254.285714 |
| 21 | 2019-09-01 | 492620.428564 |
| 22 | 2019-10-01 | 383541.714305 |
| 23 | 2019-11-01 | 311651.142861 |
| 24 | 2019-12-01 | 471218.000003 |
| 25 | 2020-01-01 | 363873.285724 |
| 26 | 2020-02-01 | 508112.714284 |
| 27 | 2020-03-01 | 690966.428568 |
| 28 | 2020-04-01 | 839775.571424 |
| 29 | 2020-05-01 | 584261.428579 |
| 30 | 2020-06-01 | 0.000000 |
| 31 | 2020-07-01 | 0.000000 |
So now our dataset is showing aggregated sale in quantity and requested quantity.
Now, checking our data for stationarity.
In order to use time series forecasting models, we need to ensure that our time series data is stationary i.e constant mean, constant variance and constant covariance with time.
So let us plot it and see.
import chart_studio.plotly as py
import plotly.offline as pyoff
import plotly.graph_objects as go
#plot monthly sales
plot_data = [
go.Scatter(
x=product_sales['monthly_date'],
y=product_sales['sell_in_qty'],
)
]
plot_layout = go.Layout(
title='Sell in quantity'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
#pyoff.iplot(fig)
fig.show()
It is not stationary and have an increasing trend over the months. And towards june and july it is a flat line because the dataset has no values there.
#plot requested quantity
plot_data = [
go.Scatter(
x=product_requested['monthly_date'],
y=product_requested['requested_qty'],
)
]
plot_layout = go.Layout(
title='Requested quantity'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
Also, for the requested quantity it is not stationary and have an increasing trends over the months. And again towards june and july it is a flat line.
Now let us plot monthly sales and requested quantity overlaying ech other.
product_sales_requested = pd.merge(product_sales, product_requested, how='inner', on = 'monthly_date')
product_sales_requested
| monthly_date | sell_in_qty | requested_qty | |
|---|---|---|---|
| 0 | 2017-12-01 | 6455.159707 | 6740.428571 |
| 1 | 2018-01-01 | 305853.773321 | 331772.000002 |
| 2 | 2018-02-01 | 355411.037870 | 444620.714280 |
| 3 | 2018-03-01 | 289141.505604 | 339028.428569 |
| 4 | 2018-04-01 | 276667.021410 | 308394.857150 |
| 5 | 2018-05-01 | 317928.140667 | 353002.285709 |
| 6 | 2018-06-01 | 282536.496024 | 303604.285704 |
| 7 | 2018-07-01 | 270871.260218 | 287819.000000 |
| 8 | 2018-08-01 | 323579.714510 | 340184.857142 |
| 9 | 2018-09-01 | 316631.021185 | 349224.857135 |
| 10 | 2018-10-01 | 157359.243829 | 167510.428577 |
| 11 | 2018-11-01 | 295758.400315 | 305921.857150 |
| 12 | 2018-12-01 | 348175.282931 | 361389.857150 |
| 13 | 2019-01-01 | 323856.485921 | 337229.000003 |
| 14 | 2019-02-01 | 254740.442035 | 251940.857152 |
| 15 | 2019-03-01 | 456095.371668 | 533232.571421 |
| 16 | 2019-04-01 | 521940.245998 | 623762.285726 |
| 17 | 2019-05-01 | 428531.669332 | 484507.285697 |
| 18 | 2019-06-01 | 434096.404087 | 503502.142879 |
| 19 | 2019-07-01 | 698939.268621 | 724598.428563 |
| 20 | 2019-08-01 | 651601.935819 | 614254.285714 |
| 21 | 2019-09-01 | 442963.233835 | 492620.428564 |
| 22 | 2019-10-01 | 350025.409798 | 383541.714305 |
| 23 | 2019-11-01 | 324671.612968 | 311651.142861 |
| 24 | 2019-12-01 | 456584.470433 | 471218.000003 |
| 25 | 2020-01-01 | 307274.262608 | 363873.285724 |
| 26 | 2020-02-01 | 568144.374972 | 508112.714284 |
| 27 | 2020-03-01 | 597742.923509 | 690966.428568 |
| 28 | 2020-04-01 | 670535.527965 | 839775.571424 |
| 29 | 2020-05-01 | 465339.577604 | 584261.428579 |
| 30 | 2020-06-01 | 0.000000 | 0.000000 |
| 31 | 2020-07-01 | 0.000000 | 0.000000 |
#plot sales in quantity and requested quantity
plot_data = [
go.Scatter(
x=product_sales_requested['monthly_date'],
y=product_sales_requested['sell_in_qty'],
name='sale in quantity'
),
go.Scatter(
x=product_sales_requested['monthly_date'],
y=product_sales_requested['requested_qty'],
name='requested quantity'
)
]
plot_layout = go.Layout(
title='Sales vs Requested quantity'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
Now let us work with the sale in quantity dataframe first. We start by making the data stationary first.
We can make our data stationary using the difference method. For this we need to get the difference in sell in quatity compared to the previous month and build the model on it.
product_sales_diff = product_sales.copy()
#add previous sale in quantity to the next row
product_sales_diff['prev_sell'] = product_sales_diff['sell_in_qty'].shift(1)
product_sales_diff.head()
| monthly_date | sell_in_qty | prev_sell | |
|---|---|---|---|
| 0 | 2017-12-01 | 6455.159707 | NaN |
| 1 | 2018-01-01 | 305853.773321 | 6455.159707 |
| 2 | 2018-02-01 | 355411.037870 | 305853.773321 |
| 3 | 2018-03-01 | 289141.505604 | 355411.037870 |
| 4 | 2018-04-01 | 276667.021410 | 289141.505604 |
#drop the null values and calculate the difference
product_sales_diff = product_sales_diff.dropna()
product_sales_diff['diff'] = (product_sales_diff['sell_in_qty'] - product_sales_diff['prev_sell'])
product_sales_diff.head(10)
| monthly_date | sell_in_qty | prev_sell | diff | |
|---|---|---|---|---|
| 1 | 2018-01-01 | 305853.773321 | 6455.159707 | 299398.613614 |
| 2 | 2018-02-01 | 355411.037870 | 305853.773321 | 49557.264549 |
| 3 | 2018-03-01 | 289141.505604 | 355411.037870 | -66269.532266 |
| 4 | 2018-04-01 | 276667.021410 | 289141.505604 | -12474.484194 |
| 5 | 2018-05-01 | 317928.140667 | 276667.021410 | 41261.119257 |
| 6 | 2018-06-01 | 282536.496024 | 317928.140667 | -35391.644643 |
| 7 | 2018-07-01 | 270871.260218 | 282536.496024 | -11665.235806 |
| 8 | 2018-08-01 | 323579.714510 | 270871.260218 | 52708.454293 |
| 9 | 2018-09-01 | 316631.021185 | 323579.714510 | -6948.693325 |
| 10 | 2018-10-01 | 157359.243829 | 316631.021185 | -159271.777357 |
Now we have the dataframe for modelling the difference. Let us check for the stationarity by plotting the data.
#plot sales diff
plot_data = [
go.Scatter(
x=product_sales_diff['monthly_date'],
y=product_sales_diff['diff'],
)
]
plot_layout = go.Layout(
title='Sell in quantity Diff'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
Now the data is stationary. So we can start developing our feature set now. For this we will need previous monthly sales data to predict the next ones. The look-back period will be 12 for our data.
#create new dataframe from transformation from time series to supervised
product_sales_supervised = product_sales_diff.drop(['prev_sell'],axis=1)
#adding lags
for inc in range(1,13):
field_name = 'lag_' + str(inc)
product_sales_supervised[field_name] = product_sales_supervised['diff'].shift(inc)
product_sales_supervised.head(10)
| monthly_date | sell_in_qty | diff | lag_1 | lag_2 | lag_3 | lag_4 | lag_5 | lag_6 | lag_7 | lag_8 | lag_9 | lag_10 | lag_11 | lag_12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2018-01-01 | 305853.773321 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2018-02-01 | 355411.037870 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2018-03-01 | 289141.505604 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2018-04-01 | 276667.021410 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 2018-05-01 | 317928.140667 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 6 | 2018-06-01 | 282536.496024 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7 | 2018-07-01 | 270871.260218 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN | NaN |
| 8 | 2018-08-01 | 323579.714510 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN | NaN |
| 9 | 2018-09-01 | 316631.021185 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN | NaN |
| 10 | 2018-10-01 | 157359.243829 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 | NaN | NaN | NaN |
#drop null values
product_sales_supervised = product_sales_supervised.dropna().reset_index(drop=True)
product_sales_supervised
| monthly_date | sell_in_qty | diff | lag_1 | lag_2 | lag_3 | lag_4 | lag_5 | lag_6 | lag_7 | lag_8 | lag_9 | lag_10 | lag_11 | lag_12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-01-01 | 323856.485921 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 | 299398.613614 |
| 1 | 2019-02-01 | 254740.442035 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 | 49557.264549 |
| 2 | 2019-03-01 | 456095.371668 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 | -66269.532266 |
| 3 | 2019-04-01 | 521940.245998 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 | -12474.484194 |
| 4 | 2019-05-01 | 428531.669332 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 | 41261.119257 |
| 5 | 2019-06-01 | 434096.404087 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 | -35391.644643 |
| 6 | 2019-07-01 | 698939.268621 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 | -11665.235806 |
| 7 | 2019-08-01 | 651601.935819 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 | 52708.454293 |
| 8 | 2019-09-01 | 442963.233835 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 | -6948.693325 |
| 9 | 2019-10-01 | 350025.409798 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 | -159271.777357 |
| 10 | 2019-11-01 | 324671.612968 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 | 138399.156486 |
| 11 | 2019-12-01 | 456584.470433 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 | 52416.882616 |
| 12 | 2020-01-01 | 307274.262608 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 | -24318.797010 |
| 13 | 2020-02-01 | 568144.374972 | 260870.112364 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 | -69116.043886 |
| 14 | 2020-03-01 | 597742.923509 | 29598.548537 | 260870.112364 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 | 201354.929633 |
| 15 | 2020-04-01 | 670535.527965 | 72792.604455 | 29598.548537 | 260870.112364 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 | 65844.874330 |
| 16 | 2020-05-01 | 465339.577604 | -205195.950361 | 72792.604455 | 29598.548537 | 260870.112364 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 | -93408.576666 |
| 17 | 2020-06-01 | 0.000000 | -465339.577604 | -205195.950361 | 72792.604455 | 29598.548537 | 260870.112364 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 | 5564.734755 |
| 18 | 2020-07-01 | 0.000000 | 0.000000 | -465339.577604 | -205195.950361 | 72792.604455 | 29598.548537 | 260870.112364 | -149310.207825 | 131912.857465 | -25353.796831 | -92937.824037 | -208638.701983 | -47337.332803 | 264842.864534 |
Now our features are ready. Let us check how good the features explain the variation in the lag(1,12). This can be measured using Adjusted R-squared. The Adjusted R-squared explains the variation in column diff. In order to calculate the Adjusted R-Squared we basically fit a linear regression model and then calculate the adjusted R-squared.
# Import statsmodels.formula.api
import statsmodels.formula.api as smf
# Define the regression formula
model = smf.ols(formula='diff ~ lag_1', data=product_sales_supervised)
# Fit the regression
model_fit = model.fit()
# Extract the adjusted r-squared
regression_adj_rsq = model_fit.rsquared_adj
print(regression_adj_rsq)
-0.054166440166043994
# Import statsmodels.formula.api
import statsmodels.formula.api as smf
# Define the regression formula
model = smf.ols(formula='diff ~ lag_1 + lag_2 + lag_3 + lag_4 + lag_5', data=product_sales_supervised)
# Fit the regression
model_fit = model.fit()
# Extract the adjusted r-squared
regression_adj_rsq = model_fit.rsquared_adj
print(regression_adj_rsq)
-0.15855098704295978
# Import statsmodels.formula.api
import statsmodels.formula.api as smf
# Define the regression formula
model = smf.ols(formula='diff ~ lag_1 + lag_2 + lag_3 + lag_4 + lag_5 + lag_6 + lag_7 + lag_8 + lag_9 + lag_10 + lag_11 + lag_12 ', data=product_sales_supervised)
# Fit the regression
model_fit = model.fit()
# Extract the adjusted r-squared
regression_adj_rsq = model_fit.rsquared_adj
print(regression_adj_rsq)
0.33392264519211623
The score is 33% which is slightly towards moderate value.
Using MinMaxScaler here to scale each future between -1 and 1 here.
#import MinMaxScaler and create a new dataframe for LSTM model
from sklearn.preprocessing import MinMaxScaler
product_sales_model = product_sales_supervised.drop(['sell_in_qty','monthly_date'],axis=1)
Now splitting the dataset into train and test set. We will train our model on train set and then use the test set to predict values for sales quantity from February to July.
#split train and test set
train_set, test_set = product_sales_model[0:-6].values, product_sales_model[-6:].values
product_sales_model.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 19 entries, 0 to 18 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 diff 19 non-null float64 1 lag_1 19 non-null float64 2 lag_2 19 non-null float64 3 lag_3 19 non-null float64 4 lag_4 19 non-null float64 5 lag_5 19 non-null float64 6 lag_6 19 non-null float64 7 lag_7 19 non-null float64 8 lag_8 19 non-null float64 9 lag_9 19 non-null float64 10 lag_10 19 non-null float64 11 lag_11 19 non-null float64 12 lag_12 19 non-null float64 dtypes: float64(13) memory usage: 2.1 KB
#apply Min Max Scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train_set)
# reshape training set
train_set = train_set.reshape(train_set.shape[0], train_set.shape[1])
train_set_scaled = scaler.transform(train_set)
# reshape test set
test_set = test_set.reshape(test_set.shape[0], test_set.shape[1])
test_set_scaled = scaler.transform(test_set)
Now creating feature and label sets from scaled datasets.
X_train, y_train = train_set_scaled[:, 1:], train_set_scaled[:, 0:1]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test, y_test = test_set_scaled[:, 1:], test_set_scaled[:, 0:1]
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
import keras
from keras.layers import Dense
from keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from keras.utils import np_utils
from keras.layers import LSTM
from sklearn.model_selection import KFold, cross_val_score, train_test_split
Now let us fit our LSTM model.
model = Sequential()
model.add(LSTM(4, batch_input_shape=(1, X_train.shape[1], X_train.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=200, batch_size=1, verbose=1, shuffle=False)
Epoch 1/200 13/13 [==============================] - 13s 1ms/step - loss: 0.2154 Epoch 2/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2257 Epoch 3/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2210 Epoch 4/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2155 Epoch 5/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2102 Epoch 6/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2051 Epoch 7/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2001 Epoch 8/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1953 Epoch 9/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1906 Epoch 10/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1859 Epoch 11/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1814 Epoch 12/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1769 Epoch 13/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1724 Epoch 14/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1679 Epoch 15/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1636 Epoch 16/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1592 Epoch 17/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1549 Epoch 18/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1506 Epoch 19/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1464 Epoch 20/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1422 Epoch 21/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1381 Epoch 22/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1341 Epoch 23/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1302 Epoch 24/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1263 Epoch 25/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1226 Epoch 26/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1190 Epoch 27/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1155 Epoch 28/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1121 Epoch 29/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1089 Epoch 30/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1057 Epoch 31/200 13/13 [==============================] - 0s 2ms/step - loss: 0.1028 Epoch 32/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0999 Epoch 33/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0972 Epoch 34/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0945 Epoch 35/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0920 Epoch 36/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0896 Epoch 37/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0873 Epoch 38/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0851 Epoch 39/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0830 Epoch 40/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0809 Epoch 41/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0788 Epoch 42/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0768 Epoch 43/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0749 Epoch 44/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0730 Epoch 45/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0711 Epoch 46/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0693 Epoch 47/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0675 Epoch 48/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0657 Epoch 49/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0639 Epoch 50/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0622 Epoch 51/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0604 Epoch 52/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0587 Epoch 53/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0569 Epoch 54/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0552 Epoch 55/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0535 Epoch 56/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0518 Epoch 57/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0501 Epoch 58/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0485 Epoch 59/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0468 Epoch 60/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0452 Epoch 61/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0435 Epoch 62/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0419 Epoch 63/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0403 Epoch 64/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0387 Epoch 65/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0372 Epoch 66/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0356 Epoch 67/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0341 Epoch 68/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0327 Epoch 69/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0312 Epoch 70/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0298 Epoch 71/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0284 Epoch 72/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0271 Epoch 73/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0258 Epoch 74/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0245 Epoch 75/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0233 Epoch 76/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0221 Epoch 77/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0209 Epoch 78/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0198 Epoch 79/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0188 Epoch 80/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0177 Epoch 81/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0168 Epoch 82/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0158 Epoch 83/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0149 Epoch 84/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0141 Epoch 85/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0132 Epoch 86/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0125 Epoch 87/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0117 Epoch 88/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0110 Epoch 89/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0103 Epoch 90/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0097 Epoch 91/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0091 Epoch 92/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0086 Epoch 93/200 13/13 [==============================] - 0s 2ms/step - loss: 0.0080 Epoch 94/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0075 Epoch 95/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0071 Epoch 96/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0066 Epoch 97/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0062 Epoch 98/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0058 Epoch 99/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0055 Epoch 100/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0051 Epoch 101/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0048 Epoch 102/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0045 Epoch 103/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0042 Epoch 104/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0040 Epoch 105/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0037 Epoch 106/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0035 Epoch 107/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0033 Epoch 108/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0031 Epoch 109/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0029 Epoch 110/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0028 Epoch 111/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0026 Epoch 112/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0025 Epoch 113/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0023 Epoch 114/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0022 Epoch 115/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0021 Epoch 116/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0020 Epoch 117/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0019 Epoch 118/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0018 Epoch 119/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0017 Epoch 120/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0016 Epoch 121/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0015 Epoch 122/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0014 Epoch 123/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0013 Epoch 124/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0013 Epoch 125/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0012 Epoch 126/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0012 Epoch 127/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0011 Epoch 128/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0010 Epoch 129/200 13/13 [==============================] - 0s 1ms/step - loss: 9.8968e-04 Epoch 130/200 13/13 [==============================] - 0s 1ms/step - loss: 9.4004e-04 Epoch 131/200 13/13 [==============================] - 0s 1ms/step - loss: 8.9283e-04 Epoch 132/200 13/13 [==============================] - 0s 1ms/step - loss: 8.4789e-04 Epoch 133/200 13/13 [==============================] - 0s 1ms/step - loss: 8.0508e-04 Epoch 134/200 13/13 [==============================] - 0s 1ms/step - loss: 7.6429e-04 Epoch 135/200 13/13 [==============================] - 0s 1ms/step - loss: 7.2539e-04 Epoch 136/200 13/13 [==============================] - 0s 1ms/step - loss: 6.8828e-04 Epoch 137/200 13/13 [==============================] - 0s 1ms/step - loss: 6.5287e-04 Epoch 138/200 13/13 [==============================] - 0s 1ms/step - loss: 6.1906e-04 Epoch 139/200 13/13 [==============================] - 0s 1ms/step - loss: 5.8678e-04 Epoch 140/200 13/13 [==============================] - 0s 1ms/step - loss: 5.5596e-04 Epoch 141/200 13/13 [==============================] - 0s 1ms/step - loss: 5.2652e-04 Epoch 142/200 13/13 [==============================] - 0s 1ms/step - loss: 4.9841e-04 Epoch 143/200 13/13 [==============================] - 0s 1ms/step - loss: 4.7157e-04 Epoch 144/200 13/13 [==============================] - 0s 1ms/step - loss: 4.4593e-04 Epoch 145/200 13/13 [==============================] - 0s 1ms/step - loss: 4.2147e-04 Epoch 146/200 13/13 [==============================] - 0s 1ms/step - loss: 3.9811e-04 Epoch 147/200 13/13 [==============================] - 0s 1ms/step - loss: 3.7583e-04 Epoch 148/200 13/13 [==============================] - 0s 1ms/step - loss: 3.5458e-04 Epoch 149/200 13/13 [==============================] - 0s 1ms/step - loss: 3.3432e-04 Epoch 150/200 13/13 [==============================] - 0s 1ms/step - loss: 3.1501e-04 Epoch 151/200 13/13 [==============================] - 0s 1ms/step - loss: 2.9662e-04 Epoch 152/200 13/13 [==============================] - 0s 1ms/step - loss: 2.7911e-04 Epoch 153/200 13/13 [==============================] - 0s 1ms/step - loss: 2.6246e-04 Epoch 154/200 13/13 [==============================] - 0s 1ms/step - loss: 2.4662e-04 Epoch 155/200 13/13 [==============================] - 0s 1ms/step - loss: 2.3157e-04 Epoch 156/200 13/13 [==============================] - 0s 1ms/step - loss: 2.1728e-04 Epoch 157/200 13/13 [==============================] - 0s 1ms/step - loss: 2.0373e-04 Epoch 158/200 13/13 [==============================] - 0s 1ms/step - loss: 1.9087e-04 Epoch 159/200 13/13 [==============================] - 0s 1ms/step - loss: 1.7870e-04 Epoch 160/200 13/13 [==============================] - 0s 1ms/step - loss: 1.6717e-04 Epoch 161/200 13/13 [==============================] - 0s 1ms/step - loss: 1.5627e-04 Epoch 162/200 13/13 [==============================] - 0s 1ms/step - loss: 1.4596e-04 Epoch 163/200 13/13 [==============================] - 0s 1ms/step - loss: 1.3623e-04 Epoch 164/200 13/13 [==============================] - 0s 1ms/step - loss: 1.2706e-04 Epoch 165/200 13/13 [==============================] - 0s 1ms/step - loss: 1.1841e-04 Epoch 166/200 13/13 [==============================] - 0s 1ms/step - loss: 1.1026e-04 Epoch 167/200 13/13 [==============================] - 0s 1ms/step - loss: 1.0259e-04 Epoch 168/200 13/13 [==============================] - 0s 1ms/step - loss: 9.5390e-05 Epoch 169/200 13/13 [==============================] - 0s 1ms/step - loss: 8.8625e-05 Epoch 170/200 13/13 [==============================] - 0s 1ms/step - loss: 8.2277e-05 Epoch 171/200 13/13 [==============================] - 0s 1ms/step - loss: 7.6327e-05 Epoch 172/200 13/13 [==============================] - 0s 1ms/step - loss: 7.0755e-05 Epoch 173/200 13/13 [==============================] - 0s 1ms/step - loss: 6.5544e-05 Epoch 174/200 13/13 [==============================] - 0s 1ms/step - loss: 6.0671e-05 Epoch 175/200 13/13 [==============================] - 0s 1ms/step - loss: 5.6120e-05 Epoch 176/200 13/13 [==============================] - 0s 1ms/step - loss: 5.1875e-05 Epoch 177/200 13/13 [==============================] - 0s 1ms/step - loss: 4.7917e-05 Epoch 178/200 13/13 [==============================] - 0s 1ms/step - loss: 4.4231e-05 Epoch 179/200 13/13 [==============================] - 0s 1ms/step - loss: 4.0802e-05 Epoch 180/200 13/13 [==============================] - 0s 1ms/step - loss: 3.7613e-05 Epoch 181/200 13/13 [==============================] - 0s 1ms/step - loss: 3.4651e-05 Epoch 182/200 13/13 [==============================] - 0s 1ms/step - loss: 3.1901e-05 Epoch 183/200 13/13 [==============================] - 0s 1ms/step - loss: 2.9351e-05 Epoch 184/200 13/13 [==============================] - 0s 1ms/step - loss: 2.6988e-05 Epoch 185/200 13/13 [==============================] - 0s 1ms/step - loss: 2.4800e-05 Epoch 186/200 13/13 [==============================] - 0s 1ms/step - loss: 2.2775e-05 Epoch 187/200 13/13 [==============================] - 0s 1ms/step - loss: 2.0903e-05 Epoch 188/200 13/13 [==============================] - 0s 1ms/step - loss: 1.9173e-05 Epoch 189/200 13/13 [==============================] - 0s 1ms/step - loss: 1.7576e-05 Epoch 190/200 13/13 [==============================] - 0s 1ms/step - loss: 1.6102e-05 Epoch 191/200 13/13 [==============================] - 0s 1ms/step - loss: 1.4743e-05 Epoch 192/200 13/13 [==============================] - 0s 1ms/step - loss: 1.3491e-05 Epoch 193/200 13/13 [==============================] - 0s 1ms/step - loss: 1.2338e-05 Epoch 194/200 13/13 [==============================] - 0s 1ms/step - loss: 1.1276e-05 Epoch 195/200 13/13 [==============================] - 0s 1ms/step - loss: 1.0300e-05 Epoch 196/200 13/13 [==============================] - 0s 1ms/step - loss: 9.4018e-06 Epoch 197/200 13/13 [==============================] - 0s 1ms/step - loss: 8.5769e-06 Epoch 198/200 13/13 [==============================] - 0s 1ms/step - loss: 7.8192e-06 Epoch 199/200 13/13 [==============================] - 0s 1ms/step - loss: 7.1233e-06 Epoch 200/200 13/13 [==============================] - 0s 1ms/step - loss: 6.4851e-06
<keras.callbacks.History at 0x7fe1fc9e64c0>
Now let us do the prediction on test set and see the results.
y_pred = model.predict(X_test,batch_size=1)
y_pred
array([[-0.4399965 ],
[-0.4851928 ],
[-0.5005657 ],
[-1.3372068 ],
[ 0.15739924],
[ 0.4152523 ]], dtype=float32)
y_test
array([[ 0.98321898],
[ 0.00632112],
[ 0.18877408],
[-0.98545772],
[-2.08431201],
[-0.11870401]])
Since these are scaled data we see the difference we need to see the actual sales prediction.
#reshape y_pred
y_pred = y_pred.reshape(y_pred.shape[0], 1, y_pred.shape[1])
#rebuild test set for inverse transform
pred_test_set = []
for index in range(0,len(y_pred)):
print(np.concatenate([y_pred[index],X_test[index]],axis=1))
pred_test_set.append(np.concatenate([y_pred[index],X_test[index]],axis=1))
[[-0.43999651 -0.7493947 0.43849976 -0.22579919 -0.5112761 -1.23279991 -0.47215006 1.3520978 -0.08583303 -0.63472921 0.24847465 1.42298905 -0.60688226]] [[-0.48519281 0.98321898 -0.7493947 0.43849976 -0.22579919 -0.68718857 -1.23279991 -0.37922266 1.3520978 -0.08583303 -0.63472921 0.51252021 0.57248741]] [[-0.50056571 0.00632112 0.98321898 -0.7493947 0.43849976 -0.36848216 -0.68718857 -1.27378407 -0.37922266 1.3520978 -0.08583303 -0.55747644 -0.01839466]] [[-1.33720684 0.18877408 0.00632112 0.98321898 -0.7493947 0.37314116 -0.36848216 -0.63211846 -1.27378407 -0.37922266 1.3520978 0.10750828 -0.71280814]] [[ 0.15739924 -0.98545772 0.18877408 0.00632112 0.98321898 -0.95302417 0.37314116 -0.25730414 -0.63211846 -1.27378407 -0.37922266 1.84955361 -0.28124198]] [[ 0.4152523 -2.08431201 -0.98545772 0.18877408 0.00632112 0.98126567 -0.95302417 0.61488115 -0.25730414 -0.63211846 -1.27378407 -0.24793165 0.84932209]]
pred_test_set[0]
array([[-0.43999651, -0.7493947 , 0.43849976, -0.22579919, -0.5112761 ,
-1.23279991, -0.47215006, 1.3520978 , -0.08583303, -0.63472921,
0.24847465, 1.42298905, -0.60688226]])
#reshape pred_test_set
pred_test_set = np.array(pred_test_set)
pred_test_set = pred_test_set.reshape(pred_test_set.shape[0], pred_test_set.shape[2])
#inverse transform
pred_test_set_inverted = scaler.inverse_transform(pred_test_set)
Now we need to build the dataframe with dates and predictions. Transformed predictions shows some difference. Let us calculate the predicted sales in quantity.
#create dataframe that shows the predicted sales
result_list = []
sales_dates = list(product_sales[-7:].monthly_date)
act_sales = list(product_sales[-7:].sell_in_qty)
for index in range(0,len(pred_test_set_inverted)):
result_dict = {}
result_dict['pred_value'] = int(pred_test_set_inverted[index][0] + act_sales[index])
result_dict['monthly_date'] = sales_dates[index+1]
result_list.append(result_dict)
pred_sales_result = pd.DataFrame(result_list)
pred_sales_result
| pred_value | monthly_date | |
|---|---|---|
| 0 | 231211 | 2020-02-01 |
| 1 | 481381 | 2020-03-01 |
| 2 | 507340 | 2020-04-01 |
| 3 | 382066 | 2020-05-01 |
| 4 | 530704 | 2020-06-01 |
| 5 | 126409 | 2020-07-01 |
product_sales.head()
| monthly_date | sell_in_qty | |
|---|---|---|
| 0 | 2017-12-01 | 6455.159707 |
| 1 | 2018-01-01 | 305853.773321 |
| 2 | 2018-02-01 | 355411.037870 |
| 3 | 2018-03-01 | 289141.505604 |
| 4 | 2018-04-01 | 276667.021410 |
#merge with actual sales in quantity dataframe
product_sales_pred = pd.merge(product_sales,pred_sales_result,on='monthly_date',how='left')
product_sales_pred
| monthly_date | sell_in_qty | pred_value | |
|---|---|---|---|
| 0 | 2017-12-01 | 6455.159707 | NaN |
| 1 | 2018-01-01 | 305853.773321 | NaN |
| 2 | 2018-02-01 | 355411.037870 | NaN |
| 3 | 2018-03-01 | 289141.505604 | NaN |
| 4 | 2018-04-01 | 276667.021410 | NaN |
| 5 | 2018-05-01 | 317928.140667 | NaN |
| 6 | 2018-06-01 | 282536.496024 | NaN |
| 7 | 2018-07-01 | 270871.260218 | NaN |
| 8 | 2018-08-01 | 323579.714510 | NaN |
| 9 | 2018-09-01 | 316631.021185 | NaN |
| 10 | 2018-10-01 | 157359.243829 | NaN |
| 11 | 2018-11-01 | 295758.400315 | NaN |
| 12 | 2018-12-01 | 348175.282931 | NaN |
| 13 | 2019-01-01 | 323856.485921 | NaN |
| 14 | 2019-02-01 | 254740.442035 | NaN |
| 15 | 2019-03-01 | 456095.371668 | NaN |
| 16 | 2019-04-01 | 521940.245998 | NaN |
| 17 | 2019-05-01 | 428531.669332 | NaN |
| 18 | 2019-06-01 | 434096.404087 | NaN |
| 19 | 2019-07-01 | 698939.268621 | NaN |
| 20 | 2019-08-01 | 651601.935819 | NaN |
| 21 | 2019-09-01 | 442963.233835 | NaN |
| 22 | 2019-10-01 | 350025.409798 | NaN |
| 23 | 2019-11-01 | 324671.612968 | NaN |
| 24 | 2019-12-01 | 456584.470433 | NaN |
| 25 | 2020-01-01 | 307274.262608 | NaN |
| 26 | 2020-02-01 | 568144.374972 | 231211.0 |
| 27 | 2020-03-01 | 597742.923509 | 481381.0 |
| 28 | 2020-04-01 | 670535.527965 | 507340.0 |
| 29 | 2020-05-01 | 465339.577604 | 382066.0 |
| 30 | 2020-06-01 | 0.000000 | 530704.0 |
| 31 | 2020-07-01 | 0.000000 | 126409.0 |
So the monthly sales in quantity for June month is 530704.0 and for July month is 126409.0
Let us the plot these numbers and check how did our model perform.
#plot actual and predicted
plot_data = [
go.Scatter(
x=product_sales_pred['monthly_date'],
y=product_sales_pred['sell_in_qty'],
name='actual'
),
go.Scatter(
x=product_sales_pred['monthly_date'],
y=product_sales_pred['pred_value'],
name='predicted'
)
]
plot_layout = go.Layout(
title='Sales in quantity Prediction'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
Now let us work with the requested quantity dataframe. We start by making the data stationary first.
We can make our data stationary using the difference method. For this we need to get the difference in sell in quatity compared to the previous month and build the model on it.
product_requested_diff = product_requested.copy()
#add previous requested quantity to the next row
product_requested_diff['prev_request'] = product_requested_diff['requested_qty'].shift(1)
product_requested_diff.head()
| monthly_date | requested_qty | prev_request | |
|---|---|---|---|
| 0 | 2017-12-01 | 6740.428571 | NaN |
| 1 | 2018-01-01 | 331772.000002 | 6740.428571 |
| 2 | 2018-02-01 | 444620.714280 | 331772.000002 |
| 3 | 2018-03-01 | 339028.428569 | 444620.714280 |
| 4 | 2018-04-01 | 308394.857150 | 339028.428569 |
#drop the null values and calculate the difference
product_requested_diff = product_requested_diff.dropna()
product_requested_diff['diff_1'] = (product_requested_diff['requested_qty'] - product_requested_diff['prev_request'])
product_requested_diff.head()
| monthly_date | requested_qty | prev_request | diff_1 | |
|---|---|---|---|---|
| 1 | 2018-01-01 | 331772.000002 | 6740.428571 | 325031.571431 |
| 2 | 2018-02-01 | 444620.714280 | 331772.000002 | 112848.714278 |
| 3 | 2018-03-01 | 339028.428569 | 444620.714280 | -105592.285710 |
| 4 | 2018-04-01 | 308394.857150 | 339028.428569 | -30633.571419 |
| 5 | 2018-05-01 | 353002.285709 | 308394.857150 | 44607.428559 |
Now we have the dataframe for modelling the difference. Let us check for the stationarity by plotting the data.
#plot request diff
plot_data = [
go.Scatter(
x=product_requested_diff['monthly_date'],
y=product_requested_diff['diff_1'],
)
]
plot_layout = go.Layout(
title='Requested Diff'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)
Now the data is stationary. So we can start developing our feature set now. For this we will need previous requested data to predict the next ones. The look-back period will be 12 for our data.
#create new dataframe from transformation from time series to supervised
product_requested_supervised = product_requested_diff.drop(['prev_request'],axis=1)
#adding lags
for inc in range(1,13):
field_name = 'lag_' + str(inc)
product_requested_supervised[field_name] = product_requested_supervised['diff_1'].shift(inc)
product_requested_supervised.head(10)
| monthly_date | requested_qty | diff_1 | lag_1 | lag_2 | lag_3 | lag_4 | lag_5 | lag_6 | lag_7 | lag_8 | lag_9 | lag_10 | lag_11 | lag_12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2018-01-01 | 331772.000002 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2018-02-01 | 444620.714280 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2018-03-01 | 339028.428569 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2018-04-01 | 308394.857150 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 2018-05-01 | 353002.285709 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 6 | 2018-06-01 | 303604.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7 | 2018-07-01 | 287819.000000 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN | NaN |
| 8 | 2018-08-01 | 340184.857142 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN | NaN |
| 9 | 2018-09-01 | 349224.857135 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN | NaN |
| 10 | 2018-10-01 | 167510.428577 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 | NaN | NaN | NaN |
#drop null values
product_requested_supervised = product_requested_supervised.dropna().reset_index(drop=True)
product_requested_supervised
| monthly_date | requested_qty | diff_1 | lag_1 | lag_2 | lag_3 | lag_4 | lag_5 | lag_6 | lag_7 | lag_8 | lag_9 | lag_10 | lag_11 | lag_12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2019-01-01 | 337229.000003 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 | 325031.571431 |
| 1 | 2019-02-01 | 251940.857152 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 | 112848.714278 |
| 2 | 2019-03-01 | 533232.571421 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 | -105592.285710 |
| 3 | 2019-04-01 | 623762.285726 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 | -30633.571419 |
| 4 | 2019-05-01 | 484507.285697 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 | 44607.428559 |
| 5 | 2019-06-01 | 503502.142879 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 | -49398.000005 |
| 6 | 2019-07-01 | 724598.428563 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 | -15785.285704 |
| 7 | 2019-08-01 | 614254.285714 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 | 52365.857143 |
| 8 | 2019-09-01 | 492620.428564 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 | 9039.999993 |
| 9 | 2019-10-01 | 383541.714305 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 | -181714.428558 |
| 10 | 2019-11-01 | 311651.142861 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 | 138411.428572 |
| 11 | 2019-12-01 | 471218.000003 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 | 55468.000000 |
| 12 | 2020-01-01 | 363873.285724 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 | -24160.857146 |
| 13 | 2020-02-01 | 508112.714284 | 144239.428559 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 | -85288.142851 |
| 14 | 2020-03-01 | 690966.428568 | 182853.714284 | 144239.428559 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 | 281291.714269 |
| 15 | 2020-04-01 | 839775.571424 | 148809.142856 | 182853.714284 | 144239.428559 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 | 90529.714305 |
| 16 | 2020-05-01 | 584261.428579 | -255514.142844 | 148809.142856 | 182853.714284 | 144239.428559 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 | -139255.000029 |
| 17 | 2020-06-01 | 0.000000 | -584261.428579 | -255514.142844 | 148809.142856 | 182853.714284 | 144239.428559 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 | 18994.857182 |
| 18 | 2020-07-01 | 0.000000 | 0.000000 | -584261.428579 | -255514.142844 | 148809.142856 | 182853.714284 | 144239.428559 | -107344.714278 | 159566.857141 | -71890.571444 | -109078.714259 | -121633.857150 | -110344.142849 | 221096.285684 |
Now our features are ready. Let us check how good the features explain the variation in the lag(1,12). This can be measured using Adjusted R-squared. The Adjusted R-squared explains the variation in column diff. In order to calculate the Adjusted R-Squared we basically fit a linear regression model and then calculate the adjusted R-squared.
# Import statsmodels.formula.api
import statsmodels.formula.api as smf
# Define the regression formula
model = smf.ols(formula='diff_1 ~ lag_1 + lag_2 + lag_3 + lag_4 + lag_5 + lag_6 + lag_7 + lag_8 + lag_9 + lag_10 + lag_11 + lag_12 ', data=product_requested_supervised)
# Fit the regression
model_fit = model.fit()
# Extract the adjusted r-squared
regression_adj_rsq = model_fit.rsquared_adj
print(regression_adj_rsq)
0.0781291635143917
Using MinMaxScaler here to scale each future between -1 and 1 here.
#import MinMaxScaler and create a new dataframe for LSTM model
from sklearn.preprocessing import MinMaxScaler
product_requested_model = product_requested_supervised.drop(['requested_qty','monthly_date'],axis=1)
Now splitting the dataset into train and test set.
#split train and test set
Train_set, Test_set = product_requested_model[0:-6].values, product_requested_model[-6:].values
product_requested_model.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 19 entries, 0 to 18 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 diff_1 19 non-null float64 1 lag_1 19 non-null float64 2 lag_2 19 non-null float64 3 lag_3 19 non-null float64 4 lag_4 19 non-null float64 5 lag_5 19 non-null float64 6 lag_6 19 non-null float64 7 lag_7 19 non-null float64 8 lag_8 19 non-null float64 9 lag_9 19 non-null float64 10 lag_10 19 non-null float64 11 lag_11 19 non-null float64 12 lag_12 19 non-null float64 dtypes: float64(13) memory usage: 2.1 KB
#apply Min Max Scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(Train_set)
# reshape training set
Train_set = Train_set.reshape(Train_set.shape[0], Train_set.shape[1])
Train_set_scaled = scaler.transform(Train_set)
# reshape test set
Test_set = Test_set.reshape(Test_set.shape[0], Test_set.shape[1])
Test_set_scaled = scaler.transform(Test_set)
Now creating feature and label sets from scaled datasets.
X_train, y_train = Train_set_scaled[:, 1:], Train_set_scaled[:, 0:1]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test, y_test = Test_set_scaled[:, 1:], Test_set_scaled[:, 0:1]
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])
Now let us fit our LSTM model.
model = Sequential()
model.add(LSTM(4, batch_input_shape=(1, X_train.shape[1], X_train.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=200, batch_size=1, verbose=1, shuffle=False)
Epoch 1/200 13/13 [==============================] - 2s 1ms/step - loss: 0.4439 Epoch 2/200 13/13 [==============================] - 0s 1ms/step - loss: 0.4255 Epoch 3/200 13/13 [==============================] - 0s 1ms/step - loss: 0.4124 Epoch 4/200 13/13 [==============================] - 0s 1ms/step - loss: 0.4007 Epoch 5/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3900 Epoch 6/200 13/13 [==============================] - 0s 971us/step - loss: 0.3804 Epoch 7/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3715 Epoch 8/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3633 Epoch 9/200 13/13 [==============================] - 0s 6ms/step - loss: 0.3557 Epoch 10/200 13/13 [==============================] - 0s 2ms/step - loss: 0.3487 Epoch 11/200 13/13 [==============================] - 0s 2ms/step - loss: 0.3421 Epoch 12/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3360 Epoch 13/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3303 Epoch 14/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3249 Epoch 15/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3199 Epoch 16/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3152 Epoch 17/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3108 Epoch 18/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3066 Epoch 19/200 13/13 [==============================] - 0s 1ms/step - loss: 0.3027 Epoch 20/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2990 Epoch 21/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2955 Epoch 22/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2922 Epoch 23/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2890 Epoch 24/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2860 Epoch 25/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2830 Epoch 26/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2801 Epoch 27/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2772 Epoch 28/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2743 Epoch 29/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2715 Epoch 30/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2686 Epoch 31/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2657 Epoch 32/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2628 Epoch 33/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2598 Epoch 34/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2567 Epoch 35/200 13/13 [==============================] - 0s 2ms/step - loss: 0.2536 Epoch 36/200 13/13 [==============================] - 0s 2ms/step - loss: 0.2504 Epoch 37/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2472 Epoch 38/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2439 Epoch 39/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2405 Epoch 40/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2371 Epoch 41/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2336 Epoch 42/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2301 Epoch 43/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2266 Epoch 44/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2230 Epoch 45/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2193 Epoch 46/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2157 Epoch 47/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2120 Epoch 48/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2083 Epoch 49/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2045 Epoch 50/200 13/13 [==============================] - 0s 1ms/step - loss: 0.2008 Epoch 51/200 13/13 [==============================] - 0s 961us/step - loss: 0.1970 Epoch 52/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1932 Epoch 53/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1894 Epoch 54/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1856 Epoch 55/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1817 Epoch 56/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1779 Epoch 57/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1740 Epoch 58/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1701 Epoch 59/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1662 Epoch 60/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1623 Epoch 61/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1584 Epoch 62/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1544 Epoch 63/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1505 Epoch 64/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1465 Epoch 65/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1426 Epoch 66/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1387 Epoch 67/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1347 Epoch 68/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1308 Epoch 69/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1269 Epoch 70/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1230 Epoch 71/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1192 Epoch 72/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1154 Epoch 73/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1116 Epoch 74/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1078 Epoch 75/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1041 Epoch 76/200 13/13 [==============================] - 0s 1ms/step - loss: 0.1005 Epoch 77/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0969 Epoch 78/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0933 Epoch 79/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0898 Epoch 80/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0864 Epoch 81/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0830 Epoch 82/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0798 Epoch 83/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0765 Epoch 84/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0734 Epoch 85/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0704 Epoch 86/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0674 Epoch 87/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0645 Epoch 88/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0617 Epoch 89/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0590 Epoch 90/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0564 Epoch 91/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0538 Epoch 92/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0514 Epoch 93/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0490 Epoch 94/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0467 Epoch 95/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0445 Epoch 96/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0424 Epoch 97/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0404 Epoch 98/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0385 Epoch 99/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0366 Epoch 100/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0348 Epoch 101/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0331 Epoch 102/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0315 Epoch 103/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0299 Epoch 104/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0284 Epoch 105/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0270 Epoch 106/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0257 Epoch 107/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0244 Epoch 108/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0232 Epoch 109/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0220 Epoch 110/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0209 Epoch 111/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0199 Epoch 112/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0189 Epoch 113/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0179 Epoch 114/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0171 Epoch 115/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0162 Epoch 116/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0154 Epoch 117/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0146 Epoch 118/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0139 Epoch 119/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0132 Epoch 120/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0126 Epoch 121/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0120 Epoch 122/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0114 Epoch 123/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0108 Epoch 124/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0103 Epoch 125/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0098 Epoch 126/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0093 Epoch 127/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0089 Epoch 128/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0084 Epoch 129/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0080 Epoch 130/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0077 Epoch 131/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0073 Epoch 132/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0069 Epoch 133/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0066 Epoch 134/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0063 Epoch 135/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0060 Epoch 136/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0057 Epoch 137/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0054 Epoch 138/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0051 Epoch 139/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0049 Epoch 140/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0046 Epoch 141/200 13/13 [==============================] - 0s 967us/step - loss: 0.0044 Epoch 142/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0042 Epoch 143/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0040 Epoch 144/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0038 Epoch 145/200 13/13 [==============================] - 0s 969us/step - loss: 0.0036 Epoch 146/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0034 Epoch 147/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0032 Epoch 148/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0030 Epoch 149/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0029 Epoch 150/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0027 Epoch 151/200 13/13 [==============================] - 0s 996us/step - loss: 0.0026 Epoch 152/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0024 Epoch 153/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0023 Epoch 154/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0022 Epoch 155/200 13/13 [==============================] - 0s 978us/step - loss: 0.0021 Epoch 156/200 13/13 [==============================] - 0s 975us/step - loss: 0.0019 Epoch 157/200 13/13 [==============================] - 0s 963us/step - loss: 0.0018 Epoch 158/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0017 Epoch 159/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0016 Epoch 160/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0015 Epoch 161/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0015 Epoch 162/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0014 Epoch 163/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0013 Epoch 164/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0012 Epoch 165/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0011 Epoch 166/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0011 Epoch 167/200 13/13 [==============================] - 0s 1ms/step - loss: 0.0010 Epoch 168/200 13/13 [==============================] - 0s 1ms/step - loss: 9.4267e-04 Epoch 169/200 13/13 [==============================] - 0s 1ms/step - loss: 8.8462e-04 Epoch 170/200 13/13 [==============================] - 0s 966us/step - loss: 8.2972e-04 Epoch 171/200 13/13 [==============================] - 0s 995us/step - loss: 7.7785e-04 Epoch 172/200 13/13 [==============================] - 0s 996us/step - loss: 7.2885e-04 Epoch 173/200 13/13 [==============================] - 0s 951us/step - loss: 6.8260e-04 Epoch 174/200 13/13 [==============================] - 0s 1ms/step - loss: 6.3897e-04 Epoch 175/200 13/13 [==============================] - 0s 997us/step - loss: 5.9783e-04 Epoch 176/200 13/13 [==============================] - 0s 896us/step - loss: 5.5907e-04 Epoch 177/200 13/13 [==============================] - 0s 886us/step - loss: 5.2257e-04 Epoch 178/200 13/13 [==============================] - 0s 890us/step - loss: 4.8821e-04 Epoch 179/200 13/13 [==============================] - 0s 906us/step - loss: 4.5590e-04 Epoch 180/200 13/13 [==============================] - 0s 872us/step - loss: 4.2553e-04 Epoch 181/200 13/13 [==============================] - 0s 882us/step - loss: 3.9700e-04 Epoch 182/200 13/13 [==============================] - 0s 946us/step - loss: 3.7022e-04 Epoch 183/200 13/13 [==============================] - 0s 961us/step - loss: 3.4509e-04 Epoch 184/200 13/13 [==============================] - 0s 940us/step - loss: 3.2153e-04 Epoch 185/200 13/13 [==============================] - 0s 941us/step - loss: 2.9945e-04 Epoch 186/200 13/13 [==============================] - 0s 1ms/step - loss: 2.7878e-04 Epoch 187/200 13/13 [==============================] - 0s 967us/step - loss: 2.5943e-04 Epoch 188/200 13/13 [==============================] - 0s 970us/step - loss: 2.4133e-04 Epoch 189/200 13/13 [==============================] - 0s 1ms/step - loss: 2.2442e-04 Epoch 190/200 13/13 [==============================] - 0s 1ms/step - loss: 2.0862e-04 Epoch 191/200 13/13 [==============================] - 0s 1ms/step - loss: 1.9388e-04 Epoch 192/200 13/13 [==============================] - 0s 1ms/step - loss: 1.8011e-04 Epoch 193/200 13/13 [==============================] - 0s 1ms/step - loss: 1.6728e-04 Epoch 194/200 13/13 [==============================] - 0s 1ms/step - loss: 1.5533e-04 Epoch 195/200 13/13 [==============================] - 0s 1ms/step - loss: 1.4419e-04 Epoch 196/200 13/13 [==============================] - 0s 1ms/step - loss: 1.3382e-04 Epoch 197/200 13/13 [==============================] - 0s 1ms/step - loss: 1.2417e-04 Epoch 198/200 13/13 [==============================] - 0s 1ms/step - loss: 1.1521e-04 Epoch 199/200 13/13 [==============================] - 0s 1ms/step - loss: 1.0687e-04 Epoch 200/200 13/13 [==============================] - 0s 1ms/step - loss: 9.9121e-05
<keras.callbacks.History at 0x7fe1ff996e80>
Now let us do the prediction on test set and see the results.
Y_pred = model.predict(X_test,batch_size=1)
Y_pred
array([[-0.5650983 ],
[-0.15382557],
[ 0.03194261],
[-0.61628133],
[-0.18608946],
[ 1.0579216 ]], dtype=float32)
y_test
array([[ 0.34821849],
[ 0.531857 ],
[ 0.36995075],
[-1.55289526],
[-3.11632341],
[-0.33774301]])
Since these are scaled data we see the difference we need to see the actual sales prediction.
#reshape y_pred
Y_pred = Y_pred.reshape(Y_pred.shape[0], 1, Y_pred.shape[1])
#rebuild test set for inverse transform
pred_test_set = []
for index in range(0,len(Y_pred)):
print(np.concatenate([Y_pred[index],X_test[index]],axis=1))
pred_test_set.append(np.concatenate([Y_pred[index],X_test[index]],axis=1))
[[-0.56509829 -0.8482438 0.42111136 -0.52560518 -0.68624298 -0.74047614 -0.69170912 0.73998 -0.13301675 -0.81659237 0.17598501 1.8926507 -0.61942951]] [[-0.15382557 0.34821849 -0.8482438 0.47419766 -0.52560518 -0.68624298 -0.74047614 -0.69170912 0.73998 -0.13301675 -0.81659237 0.70085694 0.8273697 ]] [[ 0.03194261 0.531857 0.34821849 -0.6787528 0.47419766 -0.52560518 -0.68624298 -0.74047614 -0.69170912 0.73998 -0.13301675 -0.7347329 0.07447969]] [[-0.61628133 0.36995075 0.531857 0.40798934 -0.6787528 0.47419766 -0.52560518 -0.68624298 -0.74047614 -0.69170912 0.73998 0.25393986 -0.83242323]] [[-0.18608946 -1.55289526 0.36995075 0.5747875 0.40798934 -0.6787528 0.47419766 -0.52560518 -0.68624298 -0.74047614 -0.69170912 1.51657719 -0.20785054]] [[ 1.05792165 -3.11632341 -1.55289526 0.42772867 0.5747875 0.40798934 -0.6787528 0.47419766 -0.52560518 -0.68624298 -0.74047614 -0.55411108 0.58979336]]
pred_test_set[0]
array([[-0.56509829, -0.8482438 , 0.42111136, -0.52560518, -0.68624298,
-0.74047614, -0.69170912, 0.73998 , -0.13301675, -0.81659237,
0.17598501, 1.8926507 , -0.61942951]])
#reshape pred_test_set
pred_test_set = np.array(pred_test_set)
pred_test_set = pred_test_set.reshape(pred_test_set.shape[0], pred_test_set.shape[2])
#inverse transform
pred_test_set_inverted = scaler.inverse_transform(pred_test_set)
Now we need to build the dataframe with dates and predictions. Transformed predictions shows some difference. Let us calculate the predicted sales in quantity.
#create dataframe that shows the predicted sales
result_list = []
requested_dates = list(product_requested[-7:].monthly_date)
act_requested = list(product_requested[-7:].requested_qty)
for index in range(0,len(pred_test_set_inverted)):
result_dict = {}
result_dict['pred_value'] = int(pred_test_set_inverted[index][0] + act_requested[index])
result_dict['monthly_date'] = requested_dates[index+1]
result_list.append(result_dict)
pred_requested_result = pd.DataFrame(result_list)
pred_requested_result
| pred_value | monthly_date | |
|---|---|---|
| 0 | 316066 | 2020-02-01 |
| 1 | 546785 | 2020-03-01 |
| 2 | 768701 | 2020-04-01 |
| 3 | 781206 | 2020-05-01 |
| 4 | 616150 | 2020-06-01 |
| 5 | 293471 | 2020-07-01 |
product_requested.head()
| monthly_date | requested_qty | |
|---|---|---|
| 0 | 2017-12-01 | 6740.428571 |
| 1 | 2018-01-01 | 331772.000002 |
| 2 | 2018-02-01 | 444620.714280 |
| 3 | 2018-03-01 | 339028.428569 |
| 4 | 2018-04-01 | 308394.857150 |
#merge with actual requested quantity dataframe
product_requested_pred = pd.merge(product_requested,pred_requested_result,on='monthly_date',how='left')
product_requested_pred
| monthly_date | requested_qty | pred_value | |
|---|---|---|---|
| 0 | 2017-12-01 | 6740.428571 | NaN |
| 1 | 2018-01-01 | 331772.000002 | NaN |
| 2 | 2018-02-01 | 444620.714280 | NaN |
| 3 | 2018-03-01 | 339028.428569 | NaN |
| 4 | 2018-04-01 | 308394.857150 | NaN |
| 5 | 2018-05-01 | 353002.285709 | NaN |
| 6 | 2018-06-01 | 303604.285704 | NaN |
| 7 | 2018-07-01 | 287819.000000 | NaN |
| 8 | 2018-08-01 | 340184.857142 | NaN |
| 9 | 2018-09-01 | 349224.857135 | NaN |
| 10 | 2018-10-01 | 167510.428577 | NaN |
| 11 | 2018-11-01 | 305921.857150 | NaN |
| 12 | 2018-12-01 | 361389.857150 | NaN |
| 13 | 2019-01-01 | 337229.000003 | NaN |
| 14 | 2019-02-01 | 251940.857152 | NaN |
| 15 | 2019-03-01 | 533232.571421 | NaN |
| 16 | 2019-04-01 | 623762.285726 | NaN |
| 17 | 2019-05-01 | 484507.285697 | NaN |
| 18 | 2019-06-01 | 503502.142879 | NaN |
| 19 | 2019-07-01 | 724598.428563 | NaN |
| 20 | 2019-08-01 | 614254.285714 | NaN |
| 21 | 2019-09-01 | 492620.428564 | NaN |
| 22 | 2019-10-01 | 383541.714305 | NaN |
| 23 | 2019-11-01 | 311651.142861 | NaN |
| 24 | 2019-12-01 | 471218.000003 | NaN |
| 25 | 2020-01-01 | 363873.285724 | NaN |
| 26 | 2020-02-01 | 508112.714284 | 316066.0 |
| 27 | 2020-03-01 | 690966.428568 | 546785.0 |
| 28 | 2020-04-01 | 839775.571424 | 768701.0 |
| 29 | 2020-05-01 | 584261.428579 | 781206.0 |
| 30 | 2020-06-01 | 0.000000 | 616150.0 |
| 31 | 2020-07-01 | 0.000000 | 293471.0 |
So the monthly requested quantity for June month is 616150.0 and for July month is 293471.0.
Let us the plot these numbers and check how did our model perform.
#plot actual and predicted
plot_data = [
go.Scatter(
x=product_requested_pred['monthly_date'],
y=product_requested_pred['requested_qty'],
name='actual'
),
go.Scatter(
x=product_requested_pred['monthly_date'],
y=product_requested_pred['pred_value'],
name='predicted'
)
]
plot_layout = go.Layout(
title='Requested quantity Prediction'
)
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)